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ABSTRACT 


The popularity of face recognition systems has increased due to their non- 
invasive method of image acquisition, thus boasting the widespread 
applications. Face ageing is one major factor that influences the performance 
of face recognition algorithms. In this study, the authors present a 
comparative study of the two most accepted and experimented face ageing 
datasets (FG-Net and morph II). These datasets were used to simulate age 
invariant face recognition (AIFR) models. Four types of noises were added to 
the two face ageing datasets at the preprocessing stage. The addition of noise 
at the preprocessing stage served as a data augmentation technique that 
increased the number of sample images available for deep convolutional 
neural network (DCNN) experimentation, improved the proposed AIFR 
model and the trait aging features extraction process. The proposed AIFR 
models are developed with the pre-trained Inception-ResNet-v2 deep 


Morph dataset convolutional neural network architecture. On testing and comparing the 
models, the results revealed that FG-Net is more efficient over Morph with 
an accuracy of 0.15%, loss function of 71%, mean square error (MSE) of 
39% and mean absolute error (MAE) of -0.63%. 
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1. INTRODUCTION 

The paper aim at carrying out a comparative analysis of augmented datasets (FG-Net dataset and 
morph datasets). Both performances (accuracy, loss function, mean square error (MSE), and mean absolute 
error (MAE)) for trait-ageing invariant face recognition (AIFR) systems are compared. The significate of the 
study is that both datasets are used for AIFR. Data augmentation via the addition of noises to both datasets at 
the preprocessing phase greatly increases the accuracy and other parameters of AIFR. 
- Literature review 

Many comparisons exist in the literature between the performances of augmented datasets on age 
invariant recognition systems. The augmented dataset is usually used independently of each other to verify 
the invariability of designed face recognition systems. Two of the most common face image datasets used in 
age-invariant face recognition, FG-NET and MORPH [1] are usually at the centre of comparisons made to 
check the performance of age-invariant face recognition system. The goal is to have a good performance for 
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all datasets used for face recognition. The results got from augmented standard datasets [2] for face images 
are usually based on the robustness of designed face recognition system models to variations in pose, 
illumination, shape, and texture. It is customary to have several ways of augmenting the datasets based on the 
goal of the researcher and the challenge that needs to be solved. 

Comparisons between the performance of augmented datasets on age-invariant face recognition 
systems extend to niche applications like finding missing children who are discovered at a much later time 
(longer than ten years) [3]. The importance of comparisons, especially for niche applications, is emphasized 
in [4]. The factors that degrade the performance of face recognition systems are so numerous that it is 
sensible to have as many augmented datasets from as many providers as possible. The abundant evidence of 
the robustness of any age-invariant face recognition system is usually presented after it has passed the 
rigorous condition of being subject to varieties of augmented datasets [5]. The evidence is generally in the 
form of performance metrics like accuracy [6]. These performance metrics are used to gauge how well face 
recognition systems can accurately recognize face images of various subjects regardless of the source of the 
image, the noise added to the image and other forms of augmentation. 

The region of the face used [7] to develop the age-invariant face recognition model plays a 
significant role in the design of age-invariant face recognition systems that are robust. The region of the face, 
when extracted from various datasets, could give non-identical performances on the designed face 
recognition model. This submission extends to other face recognition models designed to checkmate the 
negative effect of trait ageing. At the centre of comparisons of augmented datasets is accuracy [8], [9]. The 
precision with which the designed face recognition model can identify subjects’ facial image after they have 
been designed to discriminate between real and generated images, estimate age and identify subjects. New 
applications of age-invariant face recognition systems like soft biometrics [10] take the comparison between 
augmented datasets seriously. The verification/identification process is thoroughly confirmed for as many 
augmented datasets as possible to verify the accuracy of the face recognition system. The algorithms used to 
develop age-invariant face recognition systems such as support vector machine (SVM) [11], principal 
component analysis (PCA) and the like, perform differently for various forms of augmented datasets. The 
authors in [12] tested the recognition system performance of a modelled age-invariant face recognition 
system after passing face images through the designed and optimized adaptive neuro-fuzzy inference system 
(ANFIS) classifier. The reviews made in [13] and [14] give in-depth studies of the performances of various 
augmented datasets on designed age-invariant face recognition system. The studies focused on the challenges 
of face recognition as it relates to the verification of designed face recognition systems using different 
augmented datasets. The studies were able to identify the challenges faced by adaptive and age-invariant face 
recognition systems through extensive and thorough comparisons using different augmented datasets. 

- Fg-Net dataset specifications and complexities 

There are 1002 images of 82 various persons with ages spanning from birth to 69 years in the FG- 
NET database. The most common age group in the dataset is within the (<41 years) age group. Some of the 
pictures of subjects in the FG-NET database were digitally taken recently while others were scanned copies 
of the original photographs taken from personal collections of the subjects. The quality of the images in the 
FG-NET database depends significantly on the skill of the photographer, the condition of the photograph, the 
sophistication of the imaging tools used and the durability of the photographic paper found in personal 
collections. Thus there are variations in sharpness, illumination, resolution, background, facial expression, 
camera angles, and facial hair. These variations make the FG-NET database a good one for AIFR research 
and samples of same subject (person) ranging from ages 2 to 43 is as shown in Figure 1. 
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Figure 1. Image age progression of Fg-Net dataset subject 1 at ages 1,5,8,10,14,16,...... 28,29,33,40,43 


- Morph dataset II specifications and complexities 
A longitudinal face database, MORPH Album 2 is a well-known publicly available dataset for face 
recognition research. The face images in the MORPH database vary in age, sex, background. The MORPH 
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dataset was collected in uncontrolled environments (the pictures were taken in real-world conditions) and 
thus has a very unique range of facial expressions. The photographs in the MORPH database were taken over 
a period of four years and the database is regularly updated. MORPH Album 2 contains 55,134 face images 
of 13,000 subjects along with metadata that shows that majority of the images were acquired in a period of 
four years. Example images, age progression, and statistics of MORPH Album 2, as shown in Figure 2. 
Figure 2(a) for white male and Figure 2(b) for african-american female. 








DOB: 9/29/1980 African-American/Black Female DOB: 7/25/1971 


Dec 2005 Mar 2006 Jul 2006 Sept 2006 Oct 2007 Mar 2003 Jun 2003 Oct 2003 Sept 2004 Mar 2007 
25 25 25 25 27 31) 31 31 33 35 


(a) (b) 


Figure 2. These figures are, (a) Image progression for white male, (b) Image progression for african-american 
female 


- Comparative analysis of Fg-Net ageing dataset and morph dataset II 

Some of the remarkable dissimilarity between the Fg-Net and the Morph datasets is that children 
dominate photos from Fg-Net. In contrast, most pictures from Morph are mainly from adult persons [15]. 
Also, the age gap between the images of the same subjects in the Fg-Net dataset is significantly wide-ranging 
as compare to the once in Morph dataset, which is relatively small [16], as shown in Table 1. Besides, Fg-Net 
contains subjects from one caucasian race, whereas Morph dataset contains the caucasoid, negroid, and 
mongoloid races [17]. Furthermore, the total images (samples) in Fg-Net are 1002 with 82 subjects, while 
that of Morph is 55,134 with 13,658 subjects [18]-[22], while details of both datasets are as shown in Table 2 
and Table 3 and Table 4 depicts the Morph numbers of facial image and decade-of-life. However, the 
Similarity is that both datasets contain face images of the same subjects at various age gaps. The sole reason 
makes both ageing datasets and can be compared experimentally base on this fact [5]-[8]. 


Table 1. Comparison of FGNET and MORPH ageing datasets [23] 


Database Images Subjects Age range Resolution 
FG-NET ageing 1,002 high-resolution colour 82 multiple race 0 to 69 years High 294 images of Hich 
database or grey-scale subjects females 430 images of males & 
MORPH database Album 1 1,724 face images 515 46 days to 29 years 240x200 pixels 
Album 2 more than 20,000 4,000 16 to 77 years - 


Table 2. Number of sample images in each age group of FG-NET dataset [24] 
Age group (in years) _ Number of samples 


0-9 371 
10-19 339 
20-29 144 
30-39 79 
40-49 46 
50-59 15 
60-69 8 
Total 1,002 


Table 3. Number of facial images based on gender and ethnicity from MORPH dataset [25] 


African European Asian Hispanic Other Total 
Male 36,832 7,961 141 1,667 44 46,645 
Female 5,757 2,598 13 102 19 8,489 
Total 42,589 10,559 154 1,769 63 55,134 
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Table 4. Morph numbers of facial image and decade-of-life [26] 
Age group (in years) | Number of samples -Male _ Number of samples -Female Total Number of Samples 


< 20 6,638 831 7,469 
20-29 14,016 2,309 16,325 
30-39 12,447 2,910 15,357 
40-49 10,062 1,988 12,050 

50+ 3,482 451 3,933 
Total 46,645 8,489 55,134 


2. RESEARCH METHOD 
2.1. Pre-processing the FG-NET database for deep learning 
A mammoth amount of data is needed to train a deep neural network. The FG-NET dataset has only 

10-15 face images of each subject at different ages amounting to 1002 images. The size of the FG-NET 
dataset is too small for deep neural network application. We preprocessed the images in the database by 
adding noise to it. The addition of noise to the FG-NET dataset helped increase the total amount of pictures 
available for deep learning application. The augmentation of the dataset was done at the preprocessing stage 
to allow for improved feature extraction. The following steps were followed to augment the FG-NET dataset 
with noise. 
a. Convert all images to three channels with matrix entries for red, green an blue (RGB) for uniformity. 
b. Viola-Jones face detector crops all face images and removes all background details from the face images 

for richer feature extraction by the proposed deep learning model. 
c. Five different versions of each image is created by the addition of four types of noise namely: 

— No noise (original cropped image preserved) 

— Poisson noise 

— Salt and pepper noise 

— Speckle noise 

— Gaussian noise 

The number of images available for deep learning experimentation was increased from 1002 to 5010 

with up to about 45-90 images per subject. The addition of noise also helped with getting the deep neural 
network to extract richer features from the face image for AFIR. Algorithm 1 shows the noise injection image 
(data augmentation) procedures. 


2.2. Pre-processing the morph database for deep learning 
A mammoth amount of data is needed to train a deep neural network. The MORPH Album 2 dataset 

has only 1-5 face images of each subject at different ages amounting to 13,000 images. The size of the is too 
small for deep neural network application. We preprocessed the images in the database by adding noise to it. 
The addition of noise to the MORPH Album 2 dataset helped increase the total amount of pictures available 
for deep learning application. The augmentation of the dataset was done at the preprocessing stage to allow 
for improved feature extraction. The following steps were followed to augment the MORPH Album 2 dataset 
with noise: 
a. Convert all images to three channels with matrix entries for red, green an blue (RGB) for uniformity. 
b. Viola Jones face detector crops all face images and removes all background details from the face images 

for richer feature extraction by the proposed deep learning model. 
c. Five different versions of each image is created by the addition of four types of noise namely: 

— No noise (original cropped image preserved) 

— Poisson noise 

— Salt and pepper noise 

— Speckle noise 

— Gaussian noise 

The number of images available for deep learning experimentation was increased from 13,000 to 

27,5000 with up to about 5-25 images per subject. The addition of noise also helped with getting the deep 
neural network to extract richer features from the face image for AFIR. 
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Algorithm 1 : Data Augmentation 


l: FaceDetector = vision.Cascadeobjectector() 

2: d = dir images’) 

3: fori = 3 

4: while i < length(d) do 

5: end while 

6: for j=3 

7: while ¿ < length(dd) do 

3: end while 

9 if < 3 then 

10: I = cat (3, [, I, I) 

l1: bbox = step(faceDectector) 

12: for f = 1 

13: while | < bbox do 

l4: end while 

15: J = imerop(I, bbox{f.:})) 

16: K = noise (J, ‘gaussian’ ) 

17: Add Gaussian noise to cropped image and save 
18: K = imnoise (J, gaussian’) 

19: while mean — 0 do 
20: while variance +— 1.4 do 
21: end while 
22: end while 
23: Add Poisson noise to cropped image and save 
d: K = imnoise (J, poisson’) 
25: while mean +— 0.8 do 
26: While probabilityof Photonsnotise + 0.2 do 
27: end while 
28: end while 
29: end if 


30: Add Salt and Pepper noise to cropped image and save 
31: K = imnoise (J, ‘salt and pepper’) 

32: while Saltprobability — 0.01 do 

33: while Pepperprobability +— 0.15 do 

34: end while 

35: end while 

36: Add Speckle noise to cropped image and save 
37: K = imnoise (J, speckle) 

38: while mean < 0.9 do 

39: while variance — 0.1 do 

40: end while 

41: end while 


2.3. Feature extraction and classification using convolutional neural network 

Over a million images from the ImageNet database was used to train the Inception-ResNet-v2 
convolutional neural network (CNN). The images that was used to train the Inception-ResNet-v2 CNN forms 
part of the databased for the imagenet large-scale visual recognition challenge. Inception-ResNet-v2 has 164 
layers and can classify images into 1000 object classes. The CNN accept images of size 299x299 for 
classification. The Inception-ResNet-v2 was used in this study to learn features for age invariant face 
recognition using a process called transfer learning. Transfer learning is the process of adapting a pre-trained 
neural network for another task for which it was not originally trained. Transfer learning was used to learn 
age invariant features from the FG-NET and MORPH datasets for AIFR. Figure 3 and Table 5 shows a 
summary of the network architecture of Inception-ResNet-v2. In order to use the Inception-ResNet-v2 
network, MATLAB R2018b was installed and downloaded the installer of the deep learning toolbox model 
for Inception-ResNet-v2 network from [27]. Run the installer to install the Inception-ResNet-v2 network in 
MATLAB R2018b. 
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Figure 3. Block diagram of the adapted Inception-Resnet-V2 architecture network [28] 


Table 5. Architecture of adapted Inception-Resnet-V2 network [29] 











Block Type Repeat Depth Filter/Stride Output size Branch 1 Branch 2 Branch 3 
1 Convolution 3x3/2 149x149x32 (32) 
1 Convolution 3x3/1 147x147x32 (32) 
1 Convolution 3x3/1 147x147x64 (64) 
1 Max Pooling 3x3/2 73x73x160 
1 Convolution 3x3/2 73x73x160 (96) 
1 Convolution 3x3/1 71x71x192 (64, 96) (64, 64,64,96) 
1 Convolution 3x3/2 35x35x384 (192) 
1 Max Pooling 3x3/2 35x35x384 
2 Inception-A 5 3 35x35x256 (32) opr 
3 Reduction-A 1 3 17x17x256 (384) (256,256,384) 
4 Inception-B 10 3 17X17X896 (192) (128,160,192) 
(256,384/2) 
5 Reduction-B 1 3 8x8x1792 (256,288/2) 
(256,288,320/2) 
6 Inception-C 5 3 8x8x 1792 (192) (192,224,256) 
7 EVEIAES 8x8 1792 
Pooling 
8 Dropout Keep 0.8 1792 
9 Softmax Classifier 82/13000 


2.4. Training the deep learning model 


The preprocessed FG-NET dataset was used to retrain the Inception-ResNet-v2 neural network for 
AIFR. The process was possible via Transfer learning. The transfer learning process is enumerated below: 


Inception-ResNet_v2 is run by MATLAB. 

Training preferences are specified. 

Begin the transfer learning process using the augmented FG-NET dataset. 
Check the validity of the transfer learning process using the validation set. 
Estimate the network’s accuracy. 


Sm ho aoge 


The preprocessed FG-NET images are loaded into MATLAB using the image datastore object. 
The images are then splited into a validation set (20% images) and a train set (80%). 
The images in the train set are resized to 299x299 for compatipility with Inception-ResNet—v2. 


2.5. Using the trained deep learning model for face recognition in FG-NET and Morph datasets 
The retrained neural network was used for testing images from the Morph Album 2 dataset using the 


following process: 

Image is read from the MORPH Album 2 dataset. 

All images are converted into an RGB matrix 

Viola-Jones algorithm is used to cop and detect faces. 

All images are resized to 299x299, resize the image to 299x299, 
The retrained Inception-ResNet-v2 neural network is loaded. 

Load the image into the retrained neural network for classification. 
Compare predicted class to the ground truth. 


m~moenaogs 
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3. RESULTS AND DISCUSSION 
3.1. Evaluation methodology 

MAE, MSE, Accuracy, and Loss function were used to check the performance of the proposed 
AIFR model [30]-[35]. 


3.1.1. Accuracy 

Accuracy is derived from the true positive (TP), true negative (TN), false positive (FP), and false 
negative (FN) values as shown in (1). True positives are correct positive classifications. True negatives are 
correct negative classifications. False positives are wrong positive classifications and false negatives are 
wrong negative classifications. 


TP+TN 
= 0 
Accuracy = oe 100% (1) 


3.1.2. Mean squared error 

The mean squared error (MSE) is a predictor value that is always positive. A score closer to zero 
better. Where, N, in this instance, is the sums of iteration,f_i is the training loss values and y_i is the testing 
loss values. Consequently, MSE is calculated, as presented in (2) [36], [37]. 


1 
MSE = ziffi — yi)" (2) 
The MSE is the average (24) of the squares of the inaccuracies (f; — y;)?. 


3.1.3. Mean absolute error 

The mean absolute error (MAE) is a measure of the disparity between two values. In this 
circumstance between y_i which is the values of training loss and y ~1, which is the value of the testing loss, 
nis the sums of iteration. Consequently, MAE is calculated, as presented in (3) [38]. 


MAE = Ziza lri sil (3) 
The MAE is the mean of the total errors (|y; — 9;|). 


3.1.4. Loss function 

Categorical cross-entropy is a loss function used to calculate the variation concerning two 
probability disseminations. This dissimilarity is computed for respectively iteration in the training and testing 
dataset. The technique to calculate the likelihood variation is as shown in the (4) [39]. 


L ros OD S Di T yi > log(¥;) (4) 


Where x is the input value, y is the true value, y is forecast value by the method, N is the sum of iteration and 
C is the sum of class labels. Wen et al. [40] recommended a loss function called centre loss in adding to using 
the definite cross-entropy loss. The idea is to growth the discriminative power of the completely learned 
features by declining the  intra-class variations. The centre loss function is 
as shown in (5). 


a C at 
L center WN) = ere an Og = ch)? (5) 


While c,, is the y;th class centre of the features and N is the sum of iterations. Wen et al. [40] detected that 
(5) seen not accomplish the expected result. Two modifications were done by Wen et al. [40] to decide this 
problem. First, the modification is to bring up to data the centers founded on a mini-batch as a additional for 
the entire dataset. For the second modification is the institution of two new variables æ and the ô — 
function. a is used to regulate the learning rates of the centre, and the -function is a Boolean that results in 
1 if the situation is true and 0 if the situation is false. In (6) defines the updated function of the class centre. 


vies Sije- i) 
1+5% 8(yi=j) 


Ac; (y, y) a (6) 
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The novel centre of each class is as shown in (7): 
t+1 _ At t 
cj =c — @. Ac; (7) 
While a € [0,1]. Wen et al., [40] introduce à to balance the two-loss functions of the total loss function. 
The complete function is shown in (8). 


L = Leross + AL center (8) 
In the event / is set to 0, the total loss function is equal to the categorical cross-entropy function is used. 


3.2. Results and discussion 

This section deal wth the results and comparative analysis of augmented datasets (FG-Net and 
Morph II) performances for trait-ageing invariant face fecognition system. Figure 4 shows FG-Net and 
Morph datasets training, and testing accuracies results in comparative analysis. With FG-Net dataset 
outperforming the Morph dataset with a mean testing accuracy of 0.15%. While Figure 5 shows FG-Net and 
Morph training and testing loss (error) results in comparative analysis. With mean FG-Net dataset output 
performance, the Morph dataset testing loss of 71%. Table 6 shows a summary of the result performance of 
augmented datasets of Fg-Net and Morph dataset. All this implied that FG-Net dataset have will perform 
better than Morph dataset during deployment of these model in age invariant face recognition (AIFR) system 


Table 6. The performance of augmented datasets (Fg-Net and Morph ID) 


Variable Fg-Net Dataset | Morph Dataset _ Percentage Difference 
Accuracy (Testing) 99.94% 99.79% 0.15% 
Loss Function (Testing) 0.0039% 0.0067% 71% 
Mean Square Error (MSE) 0.0155 0.0094 39% 
Mean Absolute Error (MAE) 0.0634 0.0638 -0.63% 


Furthermore, Figure 4 emphasize in graphical form the characteristics of FG-Net and morph training 
and testing accuracies results in comparative analysis. While Figure 5 highlights in graphical form the 
attributes of FG-Net and Morph training and testing loss (error function) results in comparative analysis. 
Figure 6 in graphical form the characteristics of FG-Net and morph squared error results comparative 
analysis. Finally, Figure 7 in graphical form the attributes of FG-Net and morph absolute error results 
comparative analysis. 
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Figure 4. FG-Net and morph training and testing accuracies results comparative analysis (the result is best 
viewed in colour) 
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Figure 5. FG-Net and morph training and testing loss (error function) results in comparative analysis 
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Figure 6. FG-Net and morph squared error results comparative analysis (the result is best viewed in colour) 
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Figure 7. FG-Net and Morph absolute error results comparative analysis (the result is best viewed in colour) 
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4. CONCLUSION 

This paper compared two of the most acceptable and experimented face ageing dataset (FG-NET 
and Morph II). These datasets were used to simulate age invariant face recognition (AIFR) models. The 
obtained results show that FG-Net and Morph datasets are similar, and the little difference may be due to 
randomness for augmenting the dataset. 
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